{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "1073ae28",
   "metadata": {},
   "source": [
    "# Lesson 3: Tune an LLM with RLHF"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "07e8bc26",
   "metadata": {},
   "source": [
    "#### Project environment setup\n",
    "\n",
    "The RLHF training process has been implemented in a machine learning pipeline as part of the (Google Cloud Pipeline Components) library. This can be run on any platform that supports KubeFlow Pipelines (an open source framework), and can also run on Google Cloud's Vertex AI Pipelines.\n",
    "\n",
    "To run it locally, install the following:\n",
    "\n",
    "```Python\n",
    "!pip3 install google-cloud-pipeline-components\n",
    "!pip3 install kfp\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fd3c7818",
   "metadata": {},
   "source": [
    "### Compile the pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5f8289be-0f03-4f97-aaf3-b4e80330bce9",
   "metadata": {
    "height": 63
   },
   "outputs": [],
   "source": [
    "# Import (RLFH is currently in preview)\n",
    "from google_cloud_pipeline_components.preview.llm \\\n",
    "import rlhf_pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10ac4718-782d-4fe7-a39f-51fe5b423210",
   "metadata": {
    "height": 46
   },
   "outputs": [],
   "source": [
    "# Import from KubeFlow pipelines\n",
    "from kfp import compiler"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ef016774-5400-45be-9682-4b0b0daa9095",
   "metadata": {
    "height": 46
   },
   "outputs": [],
   "source": [
    "# Define a path to the yaml file\n",
    "RLHF_PIPELINE_PKG_PATH = \"rlhf_pipeline.yaml\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6abca02b-c43b-4301-8be0-a34949eb38b0",
   "metadata": {
    "height": 97
   },
   "outputs": [],
   "source": [
    "# Execute the compile function\n",
    "compiler.Compiler().compile(\n",
    "    pipeline_func=rlhf_pipeline,\n",
    "    package_path=RLHF_PIPELINE_PKG_PATH\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e3904e2b-593c-4c95-b0c8-f7334a1fa728",
   "metadata": {
    "height": 46
   },
   "outputs": [],
   "source": [
    "# Print the first lines of the YAML file\n",
    "!head rlhf_pipeline.yaml"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "951da6e9",
   "metadata": {},
   "source": [
    "**Note**: to print the whole YAML file, use the following:\n",
    "```Python\n",
    "!cat rlhf_pipeline.yaml\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6e14851c",
   "metadata": {},
   "source": [
    "## Define the Vertex AI pipeline job"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fe574a96",
   "metadata": {},
   "source": [
    "### Define the location of the training and evaluation data\n",
    "Previously, the datasets were loaded from small JSONL files, but for typical training jobs, the datasets are much larger, and are usually stored in cloud storage (in this case, Google Cloud Storage).\n",
    "\n",
    "**Note:** Make sure that the three datasets are stored in the same Google Cloud Storage bucket.\n",
    "```Python\n",
    "parameter_values={\n",
    "        \"preference_dataset\": \\\n",
    "    \"gs://vertex-ai/generative-ai/rlhf/text_small/summarize_from_feedback_tfds/comparisons/train/*.jsonl\",\n",
    "        \"prompt_dataset\": \\\n",
    "    \"gs://vertex-ai/generative-ai/rlhf/text_small/reddit_tfds/train/*.jsonl\",\n",
    "        \"eval_dataset\": \\\n",
    "    \"gs://vertex-ai/generative-ai/rlhf/text_small/reddit_tfds/val/*.jsonl\",\n",
    "    ...\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e6410f32-475f-4fc6-a751-5fe627312085",
   "metadata": {},
   "source": [
    "### Choose the foundation model to be tuned\n",
    "\n",
    "In this case, we are tuning the [Llama-2](https://ai.meta.com/llama/) foundational model, the LLM to tune is called **large_model_reference**. \n",
    "\n",
    "In this course, we're tuning the llama-2-7b, but you can also run an RLHF pipeline on Vertex AI to tune models such as: the T5x or text-bison@001. \n",
    "\n",
    "```Python\n",
    "parameter_values={\n",
    "        \"large_model_reference\": \"llama-2-7b\",\n",
    "        ...\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ceff91eb",
   "metadata": {},
   "source": [
    "### Calculate the number of reward model training steps\n",
    "\n",
    "**reward_model_train_steps** is the number of steps to use when training the reward model.  This depends on the size of your preference dataset. We recommend the model should train over the preference dataset for 20-30 epochs for best results.\n",
    "\n",
    "$$ stepsPerEpoch = \\left\\lceil \\frac{datasetSize}{batchSize} \\right\\rceil$$\n",
    "$$ trainSteps = stepsPerEpoch \\times numEpochs$$\n",
    "\n",
    "The RLHF pipeline parameters are asking for the number of training steps and not number of epochs. Here's an example of how to go from epochs to training steps, given that the batch size for this pipeline is fixed at 64 examples per batch.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e1031a7d-ed33-451b-aac4-1e1b616826c4",
   "metadata": {
    "height": 46
   },
   "outputs": [],
   "source": [
    "# Preference dataset size\n",
    "PREF_DATASET_SIZE = 3000"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a7cbe0db-19de-4b0b-bfd7-8bd06a22defa",
   "metadata": {
    "height": 46
   },
   "outputs": [],
   "source": [
    "# Batch size is fixed at 64\n",
    "BATCH_SIZE = 64"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a8c94da3-3016-4743-baeb-50bedd330328",
   "metadata": {
    "height": 29
   },
   "outputs": [],
   "source": [
    "import math"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2dfb58a8-9498-41b3-afc8-7ef81c4a7404",
   "metadata": {
    "height": 63
   },
   "outputs": [],
   "source": [
    "REWARD_STEPS_PER_EPOCH = math.ceil(PREF_DATASET_SIZE / BATCH_SIZE)\n",
    "print(REWARD_STEPS_PER_EPOCH)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bbe291d6-6d5c-4fb8-a783-61fb21269766",
   "metadata": {
    "height": 29
   },
   "outputs": [],
   "source": [
    "REWARD_NUM_EPOCHS = 30"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f83ffdc9-29d6-40e7-be30-06693816de63",
   "metadata": {
    "height": 63
   },
   "outputs": [],
   "source": [
    "# Calculate number of steps in the reward model training\n",
    "reward_model_train_steps = REWARD_STEPS_PER_EPOCH * REWARD_NUM_EPOCHS"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a1f3df46-868f-470f-b4db-bbcd4ca6dfd2",
   "metadata": {
    "height": 29
   },
   "outputs": [],
   "source": [
    "print(reward_model_train_steps)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "831e866f-828e-4d5b-9d42-d206b57cb0b9",
   "metadata": {},
   "source": [
    "### Calculate the number of reinforcement learning training steps\n",
    "The **reinforcement_learning_train_steps** parameter is the number of reinforcement learning steps to perform when tuning the base model. \n",
    "- The number of training steps depends on the size of your prompt dataset. Usually, this model should train over the prompt dataset for roughly 10-20 epochs.\n",
    "- Reward hacking: if given too many training steps, the policy model may figure out a way to exploit the reward and exhibit undesired behavior."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b6880ae3-8977-4605-9b8d-e50013c03896",
   "metadata": {
    "height": 46
   },
   "outputs": [],
   "source": [
    "# Prompt dataset size\n",
    "PROMPT_DATASET_SIZE = 2000"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "08e1f705-9ff2-4204-b4f9-96bcaacf798c",
   "metadata": {
    "height": 46
   },
   "outputs": [],
   "source": [
    "# Batch size is fixed at 64\n",
    "BATCH_SIZE = 64"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "898fd671-e3f1-4174-8c63-dd64af6baa1d",
   "metadata": {
    "height": 29
   },
   "outputs": [],
   "source": [
    "import math"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "30281c30-3b29-4ec0-b7fc-0df4f0203349",
   "metadata": {
    "height": 46
   },
   "outputs": [],
   "source": [
    "RL_STEPS_PER_EPOCH = math.ceil(PROMPT_DATASET_SIZE / BATCH_SIZE)\n",
    "print(RL_STEPS_PER_EPOCH)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3415a4c5-c816-46e1-8b1e-2b6b976f2653",
   "metadata": {
    "height": 29
   },
   "outputs": [],
   "source": [
    "RL_NUM_EPOCHS = 10"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2d16b09f-5e48-4c2e-bb33-0b719e0943fa",
   "metadata": {
    "height": 63
   },
   "outputs": [],
   "source": [
    "# Calculate the number of steps in the RL training\n",
    "reinforcement_learning_train_steps = RL_STEPS_PER_EPOCH * RL_NUM_EPOCHS"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9b4079f9-0816-46c3-937f-3c85a491eb22",
   "metadata": {
    "height": 29
   },
   "outputs": [],
   "source": [
    "print(reinforcement_learning_train_steps)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e83b95cf-9f6f-45c2-810f-363f761a235b",
   "metadata": {},
   "source": [
    "### Define the instruction\n",
    "\n",
    "- Choose the task-specific instruction that you want to use to tune the foundational model.  For this example, the instruction is \"Summarize in less than 50 words.\"\n",
    "- You can choose different instructions, for example, \"Write a reply to the following question or comment.\" Note that you would also need to collect your preference dataset with the same instruction added to the prompt, so that both the responses and the human preferences are based on that instruction."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "14cbd66c-aeb2-4bf8-97f0-eba82e5de51e",
   "metadata": {
    "height": 301
   },
   "outputs": [],
   "source": [
    "# Completed values for the dictionary\n",
    "parameter_values={\n",
    "        \"preference_dataset\": \\\n",
    "    \"gs://vertex-ai/generative-ai/rlhf/text_small/summarize_from_feedback_tfds/comparisons/train/*.jsonl\",\n",
    "        \"prompt_dataset\": \\\n",
    "    \"gs://vertex-ai/generative-ai/rlhf/text_small/reddit_tfds/train/*.jsonl\",\n",
    "        \"eval_dataset\": \\\n",
    "    \"gs://vertex-ai/generative-ai/rlhf/text_small/reddit_tfds/val/*.jsonl\",\n",
    "        \"large_model_reference\": \"llama-2-7b\",\n",
    "        \"reward_model_train_steps\": 1410,\n",
    "        \"reinforcement_learning_train_steps\": 320, # results from the calculations above\n",
    "        \"reward_model_learning_rate_multiplier\": 1.0,\n",
    "        \"reinforcement_learning_rate_multiplier\": 1.0,\n",
    "        \"kl_coeff\": 0.1, # increased to reduce reward hacking\n",
    "        \"instruction\":\\\n",
    "    \"Summarize in less than 50 words\"}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "104c88a7",
   "metadata": {},
   "source": [
    "### Train with full dataset: dictionary 'parameter_values' \n",
    "\n",
    "- Adjust the settings for training with the full dataset to achieve optimal results in the evaluation (next lesson). Take a look at the new values; these results are from various training experiments in the pipeline, and the best parameter values are displayed here.\n",
    "\n",
    "```python\n",
    "parameter_values={\n",
    "        \"preference_dataset\": \\\n",
    "    \"gs://vertex-ai/generative-ai/rlhf/text/summarize_from_feedback_tfds/comparisons/train/*.jsonl\",\n",
    "        \"prompt_dataset\": \\\n",
    "    \"gs://vertex-ai/generative-ai/rlhf/text/reddit_tfds/train/*.jsonl\",\n",
    "        \"eval_dataset\": \\\n",
    "    \"gs://vertex-ai/generative-ai/rlhf/text/reddit_tfds/val/*.jsonl\",\n",
    "        \"large_model_reference\": \"llama-2-7b\",\n",
    "        \"reward_model_train_steps\": 10000,\n",
    "        \"reinforcement_learning_train_steps\": 10000, \n",
    "        \"reward_model_learning_rate_multiplier\": 1.0,\n",
    "        \"reinforcement_learning_rate_multiplier\": 0.2,\n",
    "        \"kl_coeff\": 0.1,\n",
    "        \"instruction\":\\\n",
    "    \"Summarize in less than 50 words\"}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d8bc6d51",
   "metadata": {},
   "source": [
    "### Set up Google Cloud to run the Vertex AI pipeline"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c9571014",
   "metadata": {},
   "source": [
    "Vertex AI is already installed in this classroom environment.  If you were running this on your own project, you would install Vertex AI SDK like this:\n",
    "```Python\n",
    "!pip3 install google-cloud-aiplatform\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a820b9b6-d93c-492c-8e0f-7570d4bc67c1",
   "metadata": {
    "height": 114
   },
   "outputs": [],
   "source": [
    "# Authenticate in utils\n",
    "from utils import authenticate\n",
    "credentials, PROJECT_ID, STAGING_BUCKET = authenticate()\n",
    "\n",
    "# RLFH pipeline is available in this region\n",
    "REGION = \"europe-west4\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9cf92aac",
   "metadata": {},
   "source": [
    "## Run the pipeline job on Vertex AI\n",
    "\n",
    "Now that we have created our dictionary of values, we can create a PipelineJob. This just means that the RLHF pipeline will execute on Vertex AI. So it's not running locally here in the notebook, but on some server on Google Cloud."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ba1642b4-d16d-4e9b-ba87-3391168c5a11",
   "metadata": {
    "height": 29
   },
   "outputs": [],
   "source": [
    "import google.cloud.aiplatform as aiplatform"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "427dc778-9a04-4816-8569-0b5ea1c39f14",
   "metadata": {
    "height": 63
   },
   "outputs": [],
   "source": [
    "aiplatform.init(project = PROJECT_ID,\n",
    "                location = REGION,\n",
    "                credentials = credentials)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b3eac3e3-2d17-47d7-b69f-97a20e91042b",
   "metadata": {
    "height": 46
   },
   "outputs": [],
   "source": [
    "# Look at the path for the YAML file\n",
    "RLHF_PIPELINE_PKG_PATH"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4fa58dce",
   "metadata": {},
   "source": [
    "### Create and run the pipeline job\n",
    "- Here is how you would create the pipeline job and run it if you were working on your own project.\n",
    "- This job takes about a full day to run with multiple accelerators (TPUs/GPUs), and so we're not going to run it in this classroom."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e7a3cf10",
   "metadata": {},
   "source": [
    "- To create the pipeline job:\n",
    "\n",
    "```Python\n",
    "job = aiplatform.PipelineJob(\n",
    "    display_name=\"tutorial-rlhf-tuning\",\n",
    "    pipeline_root=STAGING_BUCKET,\n",
    "    template_path=RLHF_PIPELINE_PKG_PATH,\n",
    "    parameter_values=parameter_values)\n",
    "```\n",
    "- To run the pipeline job:\n",
    "\n",
    "```Python\n",
    "job.run()\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "85bec5c2-7bc7-48f5-93e8-9bb9e5486000",
   "metadata": {},
   "source": [
    "- The content team has run this RLHF training pipeline to tune the Llama-2 model, and in the next lesson, you'll get to evaluate the log data to compare the performance of the tuned model with the original foundational model."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
